Bilingual Sense Similarity for Statistical Machine Translation

نویسندگان

  • Boxing Chen
  • George F. Foster
  • Roland Kuhn
چکیده

This paper proposes new algorithms to compute the sense similarity between two units (words, phrases, rules, etc.) from parallel corpora. The sense similarity scores are computed by using the vector space model. We then apply the algorithms to statistical machine translation by computing the sense similarity between the source and target side of translation rule pairs. Similarity scores are used as additional features of the translation model to improve translation performance. Significant improvements are obtained over a state-of-the-art hierarchical phrase-based machine translation system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

End-to-end statistical machine translation with zero or small parallel texts

We use bilingual lexicon induction techniques, which learn translations from monolingual texts in two languages, to build an end-to-end statistical machine translation (SMT) system without the use of any bilingual sentence-aligned parallel corpora. We present detailed analysis of the accuracy of bilingual lexicon induction, and show how a discriminative model can be used to combine various sign...

متن کامل

Chinese-Japanese Clause Alignment

Bi-text alignment is useful to many Natural Language Processing tasks such as machine translation, bilingual lexicography and word sense disambiguation. This paper presents a Chinese-Japanese alignment at the level of clause. After describing some characteristics in Chinese-Japanese bilingual texts, we first investigate some statistical properties of Chinese-Japanese bilingual corpus, including...

متن کامل

Exploiting Aggregate Properties of Bilingual Dictionaries For Distinguishing Senses of English Words and Inducing English Sense Clusters

We propose a novel method for inducing monolingual semantic hierarchies and sense clusters from numerous foreign-language-to-English bilingual dictionaries. The method exploits patterns of non-transitivity in translations across multiple languages. No complex or hierarchical structure is assumed or used in the input dictionaries: each is initially parsed into the “lowest common denominator” for...

متن کامل

Bilingual Correspondence Recursive Autoencoder for Statistical Machine Translation

Learning semantic representations and tree structures of bilingual phrases is beneficial for statistical machine translation. In this paper, we propose a new neural network model called Bilingual Correspondence Recursive Autoencoder (BCorrRAE) to model bilingual phrases in translation. We incorporate word alignments into BCorrRAE to allow it freely access bilingual constraints at different leve...

متن کامل

Bilingual Distributed Phrase Representations for Statistical Machine Translation

Phrase–based machine translation (PBMT) relies upon the phrase-table as the main resource for bilingual knowledge at decoding time. A phrase table in its basic form includes aligned phrases along with four probabilities indicating aspects of the co-occurrence statistics for each phrase pair. In this paper we add a new semantic similarity score as a statistical feature to enrich the phrase table...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010